74 research outputs found

    Hierarchical Side-Tuning for Vision Transformers

    Full text link
    Fine-tuning pre-trained Vision Transformers (ViT) has consistently demonstrated promising performance in the realm of visual recognition. However, adapting large pre-trained models to various tasks poses a significant challenge. This challenge arises from the need for each model to undergo an independent and comprehensive fine-tuning process, leading to substantial computational and memory demands. While recent advancements in Parameter-efficient Transfer Learning (PETL) have demonstrated their ability to achieve superior performance compared to full fine-tuning with a smaller subset of parameter updates, they tend to overlook dense prediction tasks such as object detection and segmentation. In this paper, we introduce Hierarchical Side-Tuning (HST), a novel PETL approach that enables ViT transfer to various downstream tasks effectively. Diverging from existing methods that exclusively fine-tune parameters within input spaces or certain modules connected to the backbone, we tune a lightweight and hierarchical side network (HSN) that leverages intermediate activations extracted from the backbone and generates multi-scale features to make predictions. To validate HST, we conducted extensive experiments encompassing diverse visual tasks, including classification, object detection, instance segmentation, and semantic segmentation. Notably, our method achieves state-of-the-art average Top-1 accuracy of 76.0% on VTAB-1k, all while fine-tuning a mere 0.78M parameters. When applied to object detection tasks on COCO testdev benchmark, HST even surpasses full fine-tuning and obtains better performance with 49.7 box AP and 43.2 mask AP using Cascade Mask R-CNN

    ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

    Full text link
    In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown the crucial importance of the intrinsic synergy between text detection and recognition, recent advances in Transformer-based methods usually adopt an implicit synergy strategy with shared query, which can not fully realize the potential of these two interactive tasks. In this paper, we argue that the explicit synergy considering distinct characteristics of text detection and recognition can significantly improve the performance text spotting. To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder. Specifically, we decompose the conventional shared query into task-aware queries for text polygon and content, respectively. Through the decoder with the proposed vision-language communication module, the queries interact with each other in an explicit manner while preserving discriminative patterns of text detection and recognition, thus improving performance significantly. Additionally, we propose a task-aware query initialization scheme to ensure stable training. Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods. Code is available at https://github.com/mxin262/ESTextSpotter.Comment: Accepted to ICCV 202

    SPTS v2: Single-Point Scene Text Spotting

    Full text link
    End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost single-point annotation by the proposed framework, termed SPTS v2. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19×\times faster inference speed. Most importantly, within the scope of our SPTS v2, extensive experiments further reveal an important phenomenon that single-point serves as the optimal setting for the scene text spotting compared to non-point, rectangular bounding box, and polygonal bounding box. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code will be available at https://github.com/bytedance/SPTSv2.Comment: arXiv admin note: text overlap with arXiv:2112.0791

    SPTS: Single-Point Text Spotting

    Full text link
    Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that it can be much easier to be annotated or even be automatically generated than the bounding box that requires precise positions. We believe that such a pioneer attempt indicates a significant opportunity for scene text spotting applications of a much larger scale than previously possible. The code will be publicly available

    Knockout of CAFFEOYL-COA 3-O-METHYLTRANSFERASE 6/6L enhances the S/G ratio of lignin monomers and disease resistance in Nicotiana tabacum

    Get PDF
    BackgroundNicotiana tabacum is an important economic crop, which is widely planted in the world. Lignin is very important for maintaining the physiological and stress-resistant functions of tobacco. However, higher lignin content will produce lignin gas, which is not conducive to the formation of tobacco quality. To date, how to precisely fine-tune lignin content or composition remains unclear.ResultsHere, we annotated and screened 14 CCoAOMTs in Nicotiana tabacum and obtained homozygous double mutants of CCoAOMT6 and CCoAOMT6L through CRSIPR/Cas9 technology. The phenotype showed that the double mutants have better growth than the wild type whereas the S/G ratio increased and the total sugar decreased. Resistance against the pathogen test and the extract inhibition test showed that the transgenic tobacco has stronger resistance to tobacco bacterial wilt and brown spot disease, which are infected by Ralstonia solanacearum and Alternaria alternata, respectively. The combined analysis of metabolome and transcriptome in the leaves and roots suggested that the changes of phenylpropane and terpene metabolism are mainly responsible for these phenotypes. Furthermore, the molecular docking indicated that the upregulated metabolites, such as soyasaponin Bb, improve the disease resistance due to highly stable binding with tyrosyl-tRNA synthetase targets in Ralstonia solanacearum and Alternaria alternata.ConclusionsCAFFEOYL-COA 3-O-METHYLTRANSFERASE 6/6L can regulate the S/G ratio of lignin monomers and may affect tobacco bacterial wilt and brown spot disease resistance by disturbing phenylpropane and terpene metabolism in leaves and roots of Nicotiana tabacum, such as soyasaponin Bb

    Analysis of Influential Factors of Social Satisfaction in Food Industry

    No full text
    International audienceSocial satisfaction has become an important factor to help an enterprise succeed. A new way to measure social satisfaction called social license has been proposed recently. Based on the concept of CSR (Cooperation Social Responsibility), SLO (Social License to Operate) was put forward in the 1990s. However, the concept of SLO was just used in a limited range of industries such as the mining industry. Yet this concept has not been well developed or utilized in cases like site selecting analysis and satisfaction survey. In this study, SLO will be explained and tested in the food industry and a specific survey will be done to analyze the feasibility of this concept as well as crucial factors that influence the assess of social satisfaction. There is ample evidence suggesting that the SLO, as a measurement of social satisfaction, is quite supportive in decision making for food industry companies

    The Experimental Study of Increased ICP on Cerebral Hemorrhage Rabbits with Magnetic Induction Phase Shift Method

    No full text
    Introduction Measuring magnetic induction phase shift (MIPS) changes as a function of cerebral hemorrhage volume has the potential for being a simple method for primary and non-contact detection of the occurrence and progress of cerebral hemorrhage. Our previous MIPS study showed that the intracranial pressure (ICP) was used as a contrast index and found the primary correlation between MIPS and ICP. Materials and Methods In this study,we theoretically deduced the approximate relationship between MIPS and ICP and carried out a comparison study between MIPS and ICP on cerebral hemorrhage in rabbits in this study. Acute cerebral hemorrhage was induced by injecting autologous blood (3 to 6mL) into the brain of rabbits in the experimental group (n=7). Results The animal experiment results showed that the MIPS decreased significantly as a function of injection volume in the experimental group and the changes of ICP and MIPS of rabbits from experimental group presented a negative correlation. We also found that the MIPS slopes of all experimental samples had a change trend from fastness to slowness with a reverse of the change of ICP. Conclusion These observations suggested that the non-contact MIPS method might be valuable and potential for monitoring acute cerebral hemorrhage and obtaining the ICP information
    corecore